Roots & Patterns vs. Stems plus Grammar-Lexis Specifications: on what basis should a multilingual lexical database centred on Arabic be built?

نویسندگان

  • Joseph Dichy
  • Ali Farghaly
چکیده

Machine translation engines draw on various types of databases. This paper is concerned with Arabic as a source or target language, and focuses on lexical databases. The non-concatenative nature of Arabic morphology, the complex structure of Arabic word-forms, and the general use of vowel-free writing present a real challenge to NLP developers. We show here how and why a stem-grounded lexical database, the items of which are associated with grammar-lexis specifications – as opposed to a root-&-pattern database –, is motivated both linguistically and with regards to efficiency, economy and modularity. Arguments in favour of databases relying on stems associated with grammar-lexis specifications (such as DIINAR.1 or the Arabic dB under development at SYSTRAN), rather than on roots and patterns, are the following: (a) The latter include huge numbers of rule-generated word-forms, which do not actually appear in the language. (b) Rulegenerated lemmas – as opposed to existing ones – are widely under-specified with regards to grammar-lexis relations. (c) In a Semitic language such as Arabic, the mapping of grammar-lexis specifications that need to be associated with every lexical entry of the database is decisive. (d) These specifications can only be included in a stem-based dB. Points (a) to (d) are crucial and in the context of machine translation involving Arabic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On lemmatization in Arabic,

This work is a ‘prospective extension’ of the lexical work achieved in the DIINAR-MBC Euro-Mediterranean project. It aims at contributing to the crucial issue in the field of Arabic NLP of the operations involved in lemmatization, which are necessarily based on a definition of the Arabic entries of a monolingual or multilingual lexical database. As shown in previous work, lexical entries can be...

متن کامل

Systematic Verb Stem Generation For Arabic

Performing root-based searching, concordancing, and grammar checking in Arabic requires an efficient method for matching stems with roots and vice versa. Such mapping is complicated by the hundreds of manifestations of the same root. An algorithm based on the generation method used by native speakers is proposed here to provide a mapping from roots to stems. Verb roots are classified by the typ...

متن کامل

The Architecture Of A Standard Arabic Lexical Database: Some Figures, Ratios And Categories From The DIINAR.1 Source Program

This paper is a contribution to the issue – which has, in the course of the last decade, become critical – of the basic requirements and validation criteria for lexical language resources in Standard Arabic. The work is based on a critical analysis of the architecture of the DIINAR.1 lexical database, the entries of which are associated with grammar-lexis relations operating at word-form level ...

متن کامل

Tracking Morphophonemic Transformation in Arabic Word Generation and Root Extraction

Performing root-based searching, concordancing, and grammar checking in Arabic requires an efficient method for matching stems with roots and vice versa. Such mapping is complicated by the hundreds of manifestations of the same root; the radicals often undergo replacement, fusion, inversion, and/or deletion. It is a challenge, therefore, to keep track of original radicals. An algorithm based on...

متن کامل

Rules and Exceptions: the Junggrammatiker and Verner's Law

Modern Standard Arabic is a language noted for its regularity. Is it, therefore, a suitable candidate for examination using Beedham's Method of Lexical Exceptions (Beedham 2005)? The majority of Arabic verbs are derived from triliteral (triconsonantal) roots which are slotted into a small, finite number of patterns forming the basis of the lexical system. However, a brief survey of Arabic gramm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003